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ABSTRACT 

For data represented by networks, the community structure of the 
underlying graph is of great interest. Instead of uncovering the 
overall "best" partition of nodes in communities, a more elaborate 
description is proposed in which community structures are identified 
at different scales. To this end, we take advantage of the local and 
scale-dependent information encoded in graph wavelets. After some 
developments for graph wavelets for which we introduce proper 
scale boundaries, physical scale and scaling functions, we propose 
a method to mine for communities in a scale-dependent manner. 
It relies on clustering nodes according to their wavelets or scaling 
functions, using a scale-dependent modularity. An example on a 
graph benchmark having hierarchical communities shows that we 
uncover successfully its multiscale structure. 

Index Terms — Graph wavelets, community mining, multiscale 
community, spectral clustering 

1. INTRODUCTION 

In an increasing number of applications such as social networks, 
sensor networks, internet networks, neuronal networks, transporta- 
tion networks, biological networks, data are naturally represented on 
weighted graphs; and mining such graphs for relevant information 
has been emerging in the last decade as a central issue in the more 
general study of complex systems [T][2]. A common way of simpli- 
fying the network's analysis is by separating the nodes in communi- 
ties, i.e. groups of nodes that are more connected with themselves 
than with the rest of the network [3|. As nodes in a same commu- 
nity tend also to share common properties, community mining not 
only enables us to sketch the structure of a network, it also gives 
insight on nodes' potential hidden properties. One issue in commu- 
nity mining is defining the scale at which one wants to analyze the 
network. Many algorithms (see [3 |) are based on the optimisation of 
appropriate evaluation functions such as the popular modularity 1 4 1 
and generally discard this question or propose only ad-hoc discus- 
sions. Modularity for instance is known to favour an intrinsic scale 
of description (5]|6). 

The present work develops a scale-dependent procedure which 
identifies community structures of graphs at different scales. Af- 
ter one decides on a scale of interest (or a collection of scales), our 
objective is to mine for the communities in the graph at this(ese) 
scale(s). Using multiple scales provide then a fully multiscale com- 
munity description of the graph. 
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To this end, we rely on the recent construction of graph wavelets 
based on spectral graph theory |7|, whereas other authors have pro- 
posed multiscale community mining either based on random walk 
processes (8) [9), or on definitions of parametric modularities 1101 
1111 . Graph wavelets have been applied to the source estimation 
problem for EEG 1 12 1 and in image processing 1 13] but never in the 
context of community mining. By nature, the wavelet associated to 
a node a and a scale s is centered around this node and spreads on its 
neighbourhood so that the larger is s, the larger is the spanned neigh- 
bourhood. In some sense, wavelets give an "egocentered" snapshot 
of how a node "sees" the network at that scale. We take advantage 
of this local information encoded in wavelets to develop an approach 
that clusters together nodes whose local environments are similar, 
i.e. whose associated wavelets are correlated. Then, to uncover the 
community structure at a given scale, we circumvent the intrinsic 
scale issue of the classical modularity by defining a scale-dependent 
modularity (using wavelets) which leads to this scale's community 
structure when it is maximized. 

In order to successfully use spectral graph wavelets, preliminary 
contributions to the theory of spectral graph wavelets are presented. 
We discuss what are proper scale boundaries once a graph is given 
and we end up with parameters for the pass-band filter defining the 
wavelets that are different from [7 1. Then we discuss how a physical 
scale (extension in number of nodes) can be associated to wavelets. 
Finally scaling functions are introduced for graph wavelets, in anal- 
ogy to the case of continuous wavelets. Up to our knowledge, this is 
the first introduction of a proper scaling function for graph wavelets. 

Section]2]recalls elements of spectral graph theory and wavelets. 
In sectionplare presented our three preliminary contributions to the 
theory and the use of graph wavelets. In section [4] the multiscale 
community mining with graph wavelets is described and it is tested 
on a standard benchmark in section|5] We conclude in section[6] 

2. SPECTRAL GRAPH THEORY AND WAVELETS 
2.1. The Graph Fourier Transform 

Let Q = (V, E, A) be a undirected weighted graph with V the set 
of nodes, E the set of edges, and A the weighted adjacency ma- 
trix such that Aij — Aji > is the weight of the edge between 
nodes i and j. Note N the total number of nodes. Let us define 
the graph's Laplacian matrix L — D — A where D is a diago- 
nal matrix with Da = di = YljM the strength of node i. 
L is real symmetric, therefore diagonalisable: its spectrum is com- 
posed of (Xi) l=0 JV _ 1 its set of eigenvalues that we decide to sort: 
Ao < Ai < A2 < • • • < Ajv-i; and of X the matrix of its normal- 
ized eigenvectors: \ — (XoiXil ■ ■ ■ Ixjv-i)- Considering only con- 
nected graphs, the multiplicity of eigenvalue Ao = is 1. For more 
properties of this spectrum, see [ 14 1 (theory) or [ 15 16] (empirical). 




Fig. 1. Sketch of the example graph as defined in 1171 : each node 
displayed is in fact a community of 10 nodes. The thickness of each 
link is proportional to the total number of links between the two 
corresponding communities. See section[5]for details. 

By analogy to the continuous Laplacian operator whose eigenfunc- 
tions are the continuous Fourier modes and eigenvalues their squared 
frequencies, \ is considered as the matrix of the graph's Fourier 
modes, and (-\/Aj) JV _ 1 its set of associated "frequencies". For 

instance, the graph Fourier transform / of a signal / defined on the 
nodes of the graph reads: / = X T f- 

2.2. The Graph Wavelets 

Graph wavelets were developed in 1 7 1 from the graph Fourier trans- 
form. The wavelets' construction is based on pass-band filters de- 
fined in the graph Fourier domain, generated by stretching a unique 
band-pass filter kernel g by a scale parameter s > 0. The matrix 
notation of a stretched filter is G s = diag(g(s\o), . . . , g(sAjv-i)). 
The wavelet basis at scale s, is then defined by: 

* s = (V> s ,o|^,i| • • • IV's.iv-i) = XGsX T 

where i[) 3>a is the wavelet centered around node a. The wavelet 
transform at scale s of a signal / is obtained by decomposing / on 
ty s . Here, our main use is of the wavelets themselves. 

Let us give the intuition behind this definition. At small scales 
(small scale parameter s) the filter g(s-) is stretched out, thus let- 
ting through high frequency modes essential to good localization: 
corresponding wavelets give only a description of their close neigh- 
bourhood in the graph. At large scales (large s) the filter function is 
compressed around low frequency modes and this creates wavelets 
encoding a coarser description of the local environment. 

2.3. Band-pass filter kernel 

We use the band-pass filter kernel g proposed in (7): 

( x a for x < xi 

g(x;a,/3,x 1 ,x 2 ) = < p(x) for xi < x < x 2 (1) 

I, x 2 x~^ for x > x 2 - 

Here, p(x) is the unique cubic polynomial interpolation that respects 
the continuity of g and its derivative g . The parameters a, f3, x± and 
X2 are carefully chosen so as to generate appropriate wavelets for the 
application at hand. Following (7), we set xi = 1 and a = 2. For 
the other parameters, we propose a new way to set them according 
to the study of relevant scale boundaries in |3.1| 



3. SCALE BOUNDARIES, PHYSICAL SCALES AND 
SCALING FUNCTIONS 

This section develops some contributions to the theory and use of 
graph wavelets that are required for multiscale community mining: 
first, a new way to set the parameters of g by studying relevant 
boundaries for the scales s; second, a correspondence between s 
and the actual physical scale; third, we introduce scaling functions 
associated to graph wavelets. 

3.1. Scale Boundaries 

For the band-pass filter kernel g of eq. {T}, parameters j3 and X2 are 
to be set. Instead of following |7| where f3 = 2 and x 2 = 2, we 
propose a different way to set them based on a remark from spectral 
clustering 1181 : the eigenvector \i associated to the smallest non- 
zero eigenvalue Ai is the first in importance for community min- 
ing. Eigenvector xi contains information of the coarsest description 
of the graph. It is sound to define the maximum scale parameter 
Smax such that the filter function g(s max x) starts decaying as a 
power law after x = Ai, hence s max — X2/X1. Then, we re- 
quire this maximum scale to be highly selective around Ai ; this is 
achieved by imposing that the eigenmode for A2 (and all the sub- 
sequent ones) is strongly attenuated. Choosing an attenuation by 
a factor 10, this leads us to: g(s max Ai) = 10 g(s ma x A2), hence 

f3 — l/log 10 J. In practice, we impose a minimum value of 2 
for j3 (the value from 171'). 

For community mining, it is also interesting to keep some infor- 
mation from xi at every scale, so that all wavelets keep parts of a 
possible large scale community structure. Smaller scales s stretch 
the filter towards higher eigenvalues, hence gradually reducing xi's 
contribution in smaller scale wavelets. This leads us to propose as 
minimum scale s m i n the one for which g(s m i n Ai) becomes smaller 
than 1. Using eq. dip, this gives s m i n — j-. Imposing also that 
Smin spans the whole range of eigenvalues (so that the wavelet basis 
do not forget any eigenvalue), we should have s m i n Ajv-i = X2. 
This sets X2 = Ajv_i/Ai in our study. Thereby, s max — A^^/Af . 
Fig.|2](a) shows examples of band-pass filters g(s-) when using these 
scale boundaries and parameters. This figure, along with every fig- 
ure in this paper, were computed using the graph shown in Fig.[T] 

3.2. Actual physical scale 

The issue is now to estimate, for each scale parameter s, the actual 
physical scale - or size in number of nodes - of the corresponding 
wavelets. For classical wavelets at a given scale, relevant physical 
scale in time is obtained thanks to their constant extension in time- 
domain. For graph wavelets, it is trickier: the extension of a wavelet 
in number of nodes changes with its localization in the graph. 

Our proposal is to build physical scale of graph wavelets on the 
nodal domains of the Laplacian. Nodal domains are regions where 
a function does not change sign in solving the classical Helmholtz 
equation (i.e., Laplacian eigenvalues) and it is known thanks to the 
Courant Nodal Theorem that for the Z-th eigenmode Xi 7^ 0), there 
are between 2 and I + 1 nodal domains 1 19 1. Nodal domains exist for 
graphs as the maximal connected component in which xi does not 
change sign. For each xi> one can consider its average nodal domain 
size as its period. As there are no simple analytical expressions for 
the number of nodal domains ]20, 21 ], we are bound to either com- 
pute it numerically using algorithms from |20j, or keep the simple 
upper bound of the Courant Nodal Theorem: xi has Nd(l) = I + 1 
nodal domains, a rough estimate of its period is thereby N/Nd(l)- 
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Fig. 2. These figures are computed from the graph shown in Fig[T] 
(a) band-pass filter functions g and (b) their associated low-pass fil- 
ter functions h for six different scales within the scale boundaries: 
s = 3.6 (s„), 2.4, 1.5, 0.9, 0.6 and 0.4(s min ); and the parameters 
ffom |3.1| x\ = 1, X2 = 10, a = 2 and /3 = 35. Actual physical 
size of (c) wavelets and (d) scaling functions in number of nodes. 
The thick line is the median, and the dashed lines the minimum and 
maximum values. The dotted horizontal lines correspond to the three 
theoretical hierarchical levels of the graph. 

Consider a wavelet tp SlU and its Fourier transform ip s ,a, its phys- 
ical scale reads 

s Ph (^s, a ) = N/ N4l)^f^) . 

Vt=£ Uglily 

Fig.[2](c) shows s p h(ips,a) with respect to s for every node a. To 
associate only one size to each scale s, one can use the median (rep- 
resented by the thick line) or, alternatively, the minimal and maximal 
values as bounds of the physical scale (the dashed lines). 

3.3. Scaling functions 

A third point is to define scaling functions on graphs. In Ham- 
mond's work, a unique low-pass filter is used whose sole function 
is to capture the null-frequency parts, needed to complete the re- 
construction of a signal from its wavelet transform. It is arbitrary 
and not scale dependent. In the following, we introduce scaling 
functions by analogy to the case of continuous wavelet transform 
1221 1231 . Up to our knowledge, this is the first introduction of a 
proper scaling function for graph wavelets. For that, we introduce 
scale-dependent low-pass filters, noted H s , derived from a unique 
low-pass filter kernel h stretched by the scale parameter s: H s — 
diag(h(s\o), h(sXi), . . . , /i(sAjv-i)). Imposing that 

as for usual wavelets, the columns 4> s>a of & s = X^ s X T ma V 
be understood as scaling functions at scale s. Fig. |2](b) shows 
the low-pass filters corresponding to these scaling functions, and 
Fig.[2](d) their estimated physical scale (using the same approach as 
for wavelets) with respect to the scale s. 

4. MULTISCALE COMMUNITY MINING 
4.1. Elements of the method 

Clustering techniques in data-mining generally rely on four choices 
1241 : of feature vectors for each object under consideration; of a 



distance to quantify whether vectors are close or not; of a clustering 
algorithm to separate the objects in several clusters; of an evaluation 
function to assess whether the proposed clusters are meaningful or 
not. Let us describe our multiscale community mining through these 
four key points. 

1. Scale-dependent feature vectors. The aim is to group together 
nodes whose topological environments are similar. As the local in- 
formation is encoded in the wavelets and scaling functions, we define 
for each scale s and each node a, its feature vector as being either its 
associated wavelet i/> S;a , or its associated scaling function s , a , after 
their normalization. 

2. Correlation distance. The correlation distance (equal to 1 - the 
correlation coefficient) is chosen for feature vectors. Experimentally, 
it yields better results than, e.g., the Euclidean distance. 

3. Clustering algorithm. We use a hierarchical "complete-linkage" 
clustering algorithm 1 24 25 1. An additional connectivity constraint 
is added (26): a node cannot be clustered in a group of nodes with 
which it has no topological connections. This hierarchical algorithm 
gives a dendrogram as its output. As we do not know beforehand 
how many clusters there are in the network, we have to consider 
each possible subdivision of the dendrogram. 

4. Evaluation function: scale-dependent modularity. As an 
evaluation function of the relevance of a given clustering, we in- 
troduce a scale-dependent definition of the modularity. Let S 
be a matrix of size N x J coding for a community clustering: 
S — (J-Ci |lc 2 1 ■ ■ ■ |lcj ) where tcj is the binary indicator func- 
tion of community Cj (i.e., 1c j i — 1 if node i is in Cj, else 0). The 
modularity matrix B is defined as (4): 

2m V 2m J 

where m is the sum of all weights in the graph and d the vector of 
strengths of the nodes. The classical modularity is then computed 
as: Q = tr{S T BS). 

Here, we introduce a filtered version of B at each scale s: 

B h s = <S>jB = XHsX T B. 

Then, a scale-dependent modularity is = tr(S T B^ S). We have 
the choice to use wavelet function filter g instead of the scaling func- 
tion filter h. Results with both choices will be shown later on. 

Choosing the subdivision of the dendrogram (from point 3.) that 
maximizes the scale-dependent modularity leads to a community 
structure where the scale is intrinsically imposed. We thereby cir- 
cumvent the existing problem of classical modularity which imposes 
its own uncontrolled scale (5] [6). 

4.2. Protocol for multiscale community mining 

The proposed method is summarized as follows. 

1. Choose the scale s at which one wants to study the commu- 
nity structure. 

2. Compute the normalized wavelets (resp. the scaling func- 
tions) at this scale, and use them as feature vectors. 

3. Cluster the feature vectors using a correlation distance and 
hierarchical complete-linkage clustering, and output the in- 
duced dendrogram. 

4. Evaluate each subdivision of the dendrogram with the filtered 
modularity, and keep the one yielding the maximum filtered 
modularity Qf (resp. Qg). 

The method outputs the community structure at scale s. To obtain 
a proper multiscale description, repeat the procedure with several 
scales of interest (between the scale boundaries discussed in |3.1[ l. 
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Fig. 3. Normalized mutual information (nmi) between the known 
theoretical partition and the empirical one obtained when cutting the 
dendrogram with a priori knowledge of the total number of clusters. 
Left: using wavelets. Right: using scaling functions. 
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Fig. 4. Scale-dependent (left) and classical (right) modularity with 
respect to the number of clusters for three different scales s (s — 1.2 
in dot-dashed blue, 2.4 in dashed red, 3.3 in black line). 



5. TEST OF THE METHOD ON A HIERARCHICAL 
BENCHMARK NETWORK 

Following [9 1, we test this method on a benchmark defined in 1171 
for which the multiscale community structure is known. This graph 
contains 640 nodes, divided in three hierarchical levels: there are 
64 small communities of 10 nodes each (the finest scale) embedded 
in 16 communities of 40 nodes each (the intermediate scale), them- 
selves embedded in 4 communities of 160 nodes each (the coarsest 
scale). The intra-community and inter-community density of links 
are controlled by a unique parameter p that we choose equal to 1 in 
this example. An example is visualized in Fig. [T] 

Following the proposed protocol, we obtain in step 3 one den- 
drogram per scale. Let us first assess the validity of the dendro- 
grams. We compute, for each of the three levels of description, the 
normalized mutual information (nmi) |27| between the known the- 
oretical partition and the empirical one obtained when cutting the 
dendrogram with a priori knowledge of the total number of clusters. 
We plot it with respect to the scale parameter in Fig. [3] The black 
squares stands for the fine level of description, the red triangles for 
the intermediate one, and the blue circles for the coarsest one. A nmi 
value of 1 means that both partition are the same. Results are ob- 
tained using wavelets (left), or scaling functions (right). We see that 
there always exists an interval of scales for which one may recover 
the theoretical partition from the dendrogram. When using scaling 
functions, the intervals at value 1 are slightly larger, especially for 
small scale. This validates the first three steps of our protocol: the 
dendrograms do contain the relevant scale-dependent information. 

We now turn to the next question: if we don't use our a priori 
knowledge of the theoretical number of clusters, does the maximisa- 
tion of the proposed scale-dependent modularity uncover the correct 
partitions? Fig. [4] displays the scale-dependent (left) and classical 
(right) modularity with respect to the number of clusters kept in the 
dendrogram, for three different scales s. The maxima that define 
the best partition at each scale are significantly different. For classi- 
cal modularity, it is always near the same number of clusters (hence 
pointing to communities having always the same mean size). For the 
scale-dependent modularity, it changes with s: the smaller s points 
to a higher number of clusters, hence smaller clusters. Here, we 




Fig. 5. (a), (b) Average number (over 100 bootstrap samples of the 
graph) of uncovered community, (c), (d) Average size of uncovered 
communities. Left: using wavelets. Right: using scaling functions. 



used Qs but results are similar with wavelets. This illustrates why 
we need a scale-dependent modularity. 

To obtain quantitative results for the method, we create boost- 
rap samples of the graph by randomly adding ±10% of the weight 
of each link (U |28). We plot in Fig. [5] the average number (over 
100 bootstrap samples) of uncovered communities (top plots), and 
the average size of the uncovered communities (bottom plots). The 
dotted horizontal lines correspond to the three theoretical hierarchi- 
cal levels. The method successfully recovers the community struc- 
ture both in term of number of communities and size. Using scaling 
functions lead to better results, with larger and better defined inter- 
vals where these numbers and sizes are correctly estimated. 

Finally, note that for all s, the median physical scale as defined 
in |3.2| and shown in Fig. [2] is found to be of consistent value with 
the size of the uncovered communities: the wavelets (resp. scaling 
functions) at scale s detect community of sizes around (resp. strictly 
under) the corresponding s p h ■ 

6. CONCLUSION 

We proposed a method for multiscale community mining in graphs, 
for which no parameter needs to be adjusted. The mathematical 
soundness of graph wavelets, on which this method is based, is one 
of its great asset. The first aspect of our work was to complement 
certain aspects of graph wavelets. First, having set parameters and 
scale boundaries from the graph, the only input needed if one wants 
the whole multiscale structure of a graph, is the number and repar- 
tition of scales to be studied between s m i n and s ma x- Second, if 
one wants only the community structure at an actual physical scale 
of, say, 10 nodes, our proposition for physical scales associated to 
wavelets and scaling functions can be used to estimate the corre- 
sponding scale parameter s. Then, our introduction of scaling func- 
tions associated to graph wavelets leads us to slightly better results 
for scale-dependent community mining. 

The weakness of the method is the computational cost. Two 
calculations are particularly costly: the diagonalisation of the Lapla- 
cian, and the evaluation of each possible subdivision of the dendro- 
grams. Regarding the first problem, Hammond et. al |7| proposed 
a fast wavelet transform based on Chebyshev polynomial approxi- 
mation that does not need the Laplacian's diagonalisation. This ap- 
proximation will be easily used in our method. For the dendrogram 
mining, instead of looking at all possible subdivisions future work 
may work only with subdivisions given by the largest gaps of the 
dendrograms, or, as the scale-dependent modularity has a bell shape 
curve (see Fig.|4|, one could search only for a large local maximum. 
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